Euclidean Bus Mobility and Route Optimization, A Comparison
Routes in Queens, New York City, NY
Author
Alan Vlakancic
Published
November 30, 2025
Introduction
This project uses stplanr transport modeling package to design an optimal transport route for bus or cycle routes in New York City. Stplanr is a transport planning visualization R package that can be used to plan transit networks, in addition to transit planning elements such as identifying transit catchment areas, origin/destination data and ride frequency visualizations, among others. In their white paper, the devlopers of stplanr call for an accountable, transparent and democratized transit planning system that doesn’t rely on proprietary and often vastly different data sources and data processing softwares. Although the package can visualize a whole host of data, this project will focus on comparing direct desire lines or “Euclidean” routes (as the crow flies), existing bus networks and stplanr’s optimized routes. To wit: this can map the efficiency of the bus routes compared to the most direct route possible if there were no built environment factors in the way.
Methods
To adequately compare desire lines, bus routes, and the most efficient routes with the current street network, this project will require at minimum three data sources. Each of these will be sourced separately and overlaid onto each other:
Data Source: leaflet data for basemap
Methods: This can be sourced directly into R by installing the leaflet package. It provides an interactive background map that can be used to overlay other spatial data, and the options can be toggled on and off for ease of use.
Data Source: OpenStreetMap (OSMR) data for transit data, either bus or cycle routes, this is used as the route vector data.
Methods: This can be sourced directly into R by installing the package. You may need to rationalize different projection systems to make sure they overlay correctly.
Data Source: NYC Open Data for Bus Shelter locations
Despite significant searching, there is no comprehensive bus stop dataset, so the project will focus on bus stop shelters, which are mapped via NYC Open Data. I used Bus Shelters as there would be thousands upon thousands of bus stops in NYC, and this would be too computationally intensive to process.
Methods: This can be brought into R as a CSV file. Each bus stop shelter has longitude and latitude coordinates that align with leaflet and OpenStreetMap projections.
Load the necessary R packages for spatial data manipulation and visualization (e.g., ggmap, dplyr, stplanr, osmdata, sf, leaflet).
Import the NYC basemap shapefile and bus shelter CSV data into R.
Convert the bus shelter data into an sf object with appropriate coordinate reference system (CRS).
Terms:
Desire Lines: Straight lines connecting origin and destination points, representing the most direct path between them.
Euclidean: Direct points between shelters. “As the crow flies”.
OSRM: Open Street Routing Machine, a routing engine that uses Open Street Map data to calculate routes, shortest routes, travel times, and can be used to make travel time maps, distance routing for car, bike and walking.
Interactive Map:
Show code
library(ggmap)library(dplyr)library(stplanr)library(osmdata)library(sf)library(tidyverse)library(leaflet)library(purrr)library(kableExtra)library(scales)bbox <-c(left =-73.96, bottom =40.54, right =-73.70, top =40.81)#create bounding box for NYCnyc_map <-"data/"##nyc basemap, downloaded from nyc open data. source: https://search.r-project.org/CRAN/refmans/ptools/html/nyc_bor.htmlnyc_sf <-st_read(nyc_map)
Reading layer `nybb' from data source
`C:\Users\vlakanah\OneDrive - Alfred State College\01. Semesters\14. 2025 FA\GEO511\GEO511_FinalProject\data'
using driver `ESRI Shapefile'
Simple feature collection with 5 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 913175.1 ymin: 120121.9 xmax: 1067383 ymax: 272844.3
Projected CRS: NAD83 / New York Long Island (ftUS)
Show code
#bring in nyc_map as a sfshelters_sf <-read_csv("data/Bus_Stop_Shelter_20251020.csv")#NOTE: REPLACE WITH DATA WHEN USING QUARTO!#this brings in the bus stop shelter information. source: https://data.cityofnewyork.us/Transportation/Bus-Stop-Shelters/qafz-7myzshelters_sf_fix <-st_as_sf(shelters_sf, coords =c("Longitude","Latitude"), crs =4326)#convert to sf object with the correct coordinate reference systemosmdata::set_overpass_url("https://overpass-api.de/api/interpreter")#set overpass url for open street mapsosm_data <-opq(bbox = bbox) %>%add_osm_feature(key ="highway", value =c("primary","secondary")) %>%osmdata_sf()#import data for primary and secondary highways from open street mapsshelters_sf_fix <- shelters_sf_fix %>%mutate(id =paste0("S", row_number()))#add ID column for origin-destination pairs so they have a corresponding numberflow_all <-expand.grid(o = shelters_sf_fix$id, #create origind = shelters_sf_fix$id, #create destinationstringsAsFactors =FALSE) %>%#make sure they aren't factorsfilter(o != d) %>%# this remove self-pairs so O is not Dmutate(trips =1) %>%#add trip count of 1 for each pairsample_n(50) #sample 50 random paris to avoid blowing up the computerdesire_lines_all <-od2line(flow_all, zones = shelters_sf_fix, zone_code ="id") #use od2line function to create desire lines (euclidean) for all pairsshelter_coords <- shelters_sf_fix %>%st_coordinates() %>%as.data.frame() %>%bind_cols(id = shelters_sf_fix$id)#extract coordinates and bind with ID columnroute_single <-function(o_id, d_id) { #function to create a single route between origin and destination o <- shelter_coords %>%filter(id == o_id) #filter to get origin coordinates d <- shelter_coords %>%filter(id == d_id)#filter to get destination coordinates r <-try(route_osrm(from =c(o$X, o$Y),to =c(d$X, d$Y)), silent =TRUE)#use try to catch errors (e.g., no route found)if (inherits(r, "try-error")) return(NULL)#if route found, return the routereturn(r)}routes_list <- purrr::map2(flow_all$o, flow_all$d, route_single)#create routes for all origin-destination pairs using the route_single functionroutes_list <- routes_list[!sapply(routes_list, is.null)]#remove any NULL results (failed routes)routes_sf <-do.call(rbind, routes_list)#combine all routes into a single sf object#BELOW: create a comparison between route lengths and desire line lengths & calculate means and percent changeroutes_projection <-st_transform(routes_sf, 32618)desire_projection <-st_transform(desire_lines_all, 32618)#ensures the correct, projected shapefile for computation not mappingroute_length <-st_length(routes_projection)desire_length <-st_length(desire_projection)#compute lengthsroute_length <-as.numeric(route_length)desire_length <-as.numeric(desire_length)#convert lengths to numeric valueslengths_tbl <-tibble(route_m = route_length,desire_m = desire_length,origin = flow_all$o,destination = flow_all$d)#create tibble to compare lengths in the final map w/ IDslengths_tbl_print <-tibble(route_m =comma(round(route_length)),desire_m =comma(round(desire_length)),origin = flow_all$o,destination = flow_all$d)#tidy data for later printing in a kablemean_route <-mean(route_length, na.rm =TRUE)mean_desire <-mean(desire_length, na.rm =TRUE)#calculate means for both route and desire lengthspercent_change <- ((mean_route - mean_desire) / mean_desire) *100#calculate percent change mean_lengths <-data.frame(type =c("Route Length", "Desire Line Length"),mean_length_m =c(round(mean_route), round(mean_desire)))#put these into a data frame, rounded to whole numbersmean_lengths <- mean_lengths %>%mutate(mean_length_km = mean_length_m /1000,percent_change =c(percent_change, NA) )#add km conversion and percent change to the data frame, i converted to KM for ease of computation (ie: dividing by 1,000)mean_lengths_print <- mean_lengths %>%mutate(mean_length_km =comma(round(mean_length_m /1000)),percent_change =comma(round(percent_change)) )nyc_leaflet <-st_transform(nyc_sf, 4326)roads_leaflet <-st_transform(osm_data$osm_lines, 4326)desire_leaflet <-st_transform(desire_lines_all, 4326)routes_leaflet <-st_transform(routes_sf, 4326)#transform all data to WGS84 for leaflet mappingdesire_leaflet_popup <-paste0("<b>Desire Line</b><br/>","Origin: ", desire_leaflet$o, "<br/>","Destination: ", desire_leaflet$d, "<br/>","Desire Line Distance: ", round(lengths_tbl$desire_m /1000), " km")#create popup info for desire lines for interactive maproutes_leaflet_popup <-paste0("<b>OSRM Route</b><br/>","Origin: ", desire_leaflet$o, "<br/>","Destination: ", desire_leaflet$d, "<br/>","Route Distance: ", round(lengths_tbl$route_m /1000), " km<br/>")#create popup info for routes for interactive mappal_desire <-colorNumeric(palette ="viridis",domain = lengths_tbl$desire_m)#create color palette for desire lines based on distancepal_routes <-colorNumeric(palette ="inferno",domain = lengths_tbl$route_m)#create color palette for routes based on distanceselected_ids <-unique(c(flow_all$o, flow_all$d))#get unique IDs of sampled sheltersselected_shelters <- shelters_sf_fix %>%filter(id %in% selected_ids)#filter shelters to only those that were sampledleaflet() %>%addProviderTiles("CartoDB.Positron") %>%addPolygons(data = nyc_leaflet,color ="black", weight =3,fillOpacity =0.1,group ="NYC Boundary") %>%# Add OSM roads w/ positron backgroundaddPolylines(data = desire_leaflet,color =pal_desire(lengths_tbl$desire_m),weight =3,opacity =0.7,popup = desire_leaflet_popup,group ="Desire Lines" ) %>%addPolylines(data = routes_leaflet,color =pal_routes(lengths_tbl$route_m),weight =3,opacity =0.8,popup = routes_leaflet_popup,group ="OSRM Routes" ) %>%addLegend(pal = pal_desire,values = lengths_tbl$desire_m,title ="Desire Line Distance (m)",position ="bottomright",group ="Desire Lines" ) %>%#add a legend for desire linesaddLegend(pal = pal_routes,values = lengths_tbl$route_m,title ="OSRM Route Distance (m)",position ="bottomleft",group ="OSRM Routes" ) %>%#add a legend for osrm routesaddCircleMarkers(data = shelters_sf_fix,color ="blue",radius =1,popup =~id,group ="Bus Shelters") %>%#add all bus sheltersaddCircleMarkers(data = selected_shelters,color ="purple",radius =6,fillOpacity =0.9,group ="Sampled Shelters" ) %>%#add sampled bus shelters with purple markersaddLayersControl(overlayGroups =c("NYC Boundary", "Sampled Shelters","Desire Lines", "OSRM Routes","Bus Shelters"),options =layersControlOptions(collapsed =FALSE) )
Tables:
Show code
mean_lengths_print %>%kable(col.names =c("Type", "Mean Length (m)", "Mean Length (km)", "Percent Change (%)"),caption ="Mean Lengths of OSRM Routes vs Desire Lines") %>%kable_styling(full_width =FALSE, position ="left")
Mean Lengths of OSRM Routes vs Desire Lines
Type
Mean Length (m)
Mean Length (km)
Percent Change (%)
Route Length
17712
18
20
Desire Line Length
14736
15
NA
Show code
lengths_tbl_print %>%kable(col.names =c("Route Length (m)", "Desire Line Length (m)", "Origin ID", "Destination ID"),caption ="Comparison of Route Lengths and Desire Line Lengths for Sampled Origin-Destination Pairs") %>%kable_styling(full_width =FALSE, position ="left")
Comparison of Route Lengths and Desire Line Lengths for Sampled Origin-Destination Pairs
Route Length (m)
Desire Line Length (m)
Origin ID
Destination ID
3,239
2,704
S1934
S1956
18,208
16,401
S279
S2557
17,575
15,004
S2594
S279
22,644
18,377
S3322
S357
6,621
5,706
S2050
S2062
27,109
12,630
S1432
S2515
28,281
23,475
S953
S2671
29,248
26,729
S2492
S826
11,193
9,890
S2157
S1417
15,280
11,642
S1045
S1649
25,838
18,839
S2459
S2414
14,862
13,688
S1318
S1734
7,680
6,599
S97
S1993
4,479
3,476
S1806
S1314
23,398
21,380
S3219
S2270
2,719
2,251
S685
S358
28,975
20,848
S1123
S3035
15,540
13,822
S34
S3095
13,917
12,144
S2099
S2874
15,012
11,603
S2595
S2163
29,646
19,728
S1228
S2500
20,924
18,431
S395
S2376
12,584
10,552
S2142
S451
12,721
11,506
S2315
S220
22,654
19,688
S1581
S802
16,681
14,683
S903
S2094
9,169
7,908
S1964
S176
20,550
18,946
S597
S2780
4,127
4,031
S702
S2977
6,889
6,196
S2323
S2112
14,295
11,974
S1747
S2281
18,219
15,615
S2195
S666
27,406
20,385
S2379
S1419
1,836
1,470
S2173
S1416
18,436
16,498
S866
S1860
36,554
33,209
S3221
S1550
17,991
15,371
S2143
S1418
26,754
23,599
S3339
S1496
36,112
31,710
S181
S1249
19,182
16,639
S2688
S1666
9,314
7,984
S2300
S2269
27,401
19,561
S336
S2909
15,380
12,775
S1980
S3063
11,844
10,613
S2295
S1479
25,753
22,437
S2249
S197
25,919
19,826
S3233
S210
15,002
13,667
S2388
S1950
9,185
7,487
S930
S2166
17,951
16,412
S244
S2262
23,305
20,678
S1902
S522
Results
The mean route length for the optimized routes for this particular sample run is 17.71 km, while the mean Euclidean desire line length is 14.74 km. This represents a percent change of 20.2% longer for the optimized routes compared to the direct desire lines. The interactive map above visualizes these routes, with desire lines colored based on their lengths and Open Street Routing Machine (OSRM) routes similarly colored with a different theme.
Discussion
The results indicate that the optimized OSRM routes are significantly longer than the direct desire lines, which is generally expected given the constraints of the built environment and road network. The percent change of 20.2% suggests that while the desire lines represent the most direct path between two points, real-world travel must navigate around obstacles, follow roadways, and adhere to traffic regulations etc.
Limitations
This does not represent all bus stops in NYC, just shelters. Although the exact number of bus stops is difficult to find, the MTA states that there are 327 bus routes in the five boroughs and countless stops in between. To make the data manageable both in computation and visualization, this study only selects 50 at random. This limits the amount of data points and does not fully capture the bus network.
The OSRM routing service may not always find a route between two points, especially if they are very close together or in areas with limited road connectivity. The code removes these unroutable routes, and they are not shown in the data.
The analysis does not account for real-world factors such as traffic, hazards, closures, road conditions, bus “bunching” or transit schedules, which can significantly impact actual travel times and route efficiency.The analysis assumes that the shortest path is the most efficient, which may not always be the case in real-world scenarios.
The sample size of 50 origin-destination pairs is relatively small and is not be representative of the entire bus network in NYC.
Only “Primary” and “Secondary” roads are sampled here, as the computation for the smaller roads (tertiary etc.) was too processing heavy. This eliminates a large selection of routes.
Future:
Future research could expand the sample size to include more origin-destination pairs, or even all bus stops. Incorporating real-world travel time data, traffic patterns, and transit schedules could provide a more comprehensive understanding of route efficiency. Further analysis could also explore the impact of different modes of transportation, such as cycling or walking, on route optimization and efficiency.